Removing batch effects for prediction problems with frozen surrogate variable analysis
نویسندگان
چکیده
Batch effects are responsible for the failure of promising genomic prognostic signatures, major ambiguities in published genomic results, and retractions of widely-publicized findings. Batch effect corrections have been developed to remove these artifacts, but they are designed to be used in population studies. But genomic technologies are beginning to be used in clinical applications where samples are analyzed one at a time for diagnostic, prognostic, and predictive applications. There are currently no batch correction methods that have been developed specifically for prediction. In this paper, we propose an new method called frozen surrogate variable analysis (fSVA) that borrows strength from a training set for individual sample batch correction. We show that fSVA improves prediction accuracy in simulations and in public genomic studies. fSVA is available as part of the sva Bioconductor package.
منابع مشابه
The sva package for removing batch effects and other unwanted variation in high-throughput experiments
Heterogeneity and latent variables are now widely recognized as major sources of bias and variability in high-throughput experiments. The most well-known source of latent variation in genomic experiments are batch effects-when samples are processed on different days, in different groups or by different people. However, there are also a large number of other variables that may have a major impac...
متن کاملPreserving biological heterogeneity with a permuted surrogate variable analysis for genomics batch correction
MOTIVATION Sample source, procurement process and other technical variations introduce batch effects into genomics data. Algorithms to remove these artifacts enhance differences between known biological covariates, but also carry potential concern of removing intragroup biological heterogeneity and thus any personalized genomic signatures. As a result, accurate identification of novel subtypes ...
متن کاملsvaseq: removing batch effects and other unwanted noise from sequencing data
It is now known that unwanted noise and unmodeled artifacts such as batch effects can dramatically reduce the accuracy of statistical inference in genomic experiments. These sources of noise must be modeled and removed to accurately measure biological variability and to obtain correct statistical inference when performing high-throughput genomic analysis. We introduced surrogate variable analys...
متن کاملPrediction of the Operating Conditions in a Batch Distillation Column Using a Shortcut Method
A shortcut procedure as quick, easy-to use method for design and simulation of multicomponent batch distillation is used to predict the operating condition of recovering xylene from solvent in an existing batch distillation column in benzol refinery. The procedure can be used to investigate the effect of the operating parameters on the operation of column for three possible modes of batch d...
متن کاملA comparison of different network based modeling methods for prediction of the torque of a SI engine equipped with variable valve timing
Nowadays, due to increasing the complexity of IC engines, calibration task becomes more severe and the need to use surrogate models for investigating of the engine behavior arises. Accordingly, many black box modeling approaches have been used in this context among which network based models are of the most powerful approaches thanks to their flexible structures. In this paper four network base...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 2 شماره
صفحات -
تاریخ انتشار 2014